การเขียนโปรแกรมผู้ประมวลผลขนานขนาดใหญ่: แนวทางปฏิบัติจริง: เกินข้อจำกัดของอาร์เรย์เชิงเส้น: การขยายสู่ข้อมูลหลายมิติ

ยินดีต้อนรับสู่ การถ่ายโอนครั้งใหญ่. ในโปรแกรมสำหรับหน่วยประมวลผลหลัก (CPU) เราจะกำหนด วิธีใด ในการวนซ้ำ; ในระบบประมวลผลแบบทั่วไปบนหน่วยประมวลผล (GPGPU) เราจะกำหนด อะไร ลักษณะของการวนซ้ำเป็นอย่างไร ความเปลี่ยนแปลงจากตรรกะที่เน้นคำสั่ง เป็นตรรกะที่เน้นข้อมูล ได้รับพลังงานจาก การสร้างแนวคิดพื้นฐานของเคอร์เนล.

1. โครงร่าง global

โดยใช้ __global__ ตัวชี้วัด คุณไม่ได้เขียนฟังก์ชัน—คุณกำลังออกแบบ โครงร่างที่ขยายขนาดได้. การดำเนินการเคอร์เนลเพียงครั้งเดียวแสดงถึงหน่วยงานงานอิสระหนึ่งหน่วย ทำให้หน่วยประมวลผลกราฟิก (GPU) สามารถจัดการงานที่เหมือนกันหลายพันงานในจำนวนคอร์ที่มากมหาศาล โดยไม่จำเป็นต้องควบคุมเธรดด้วยตนเอง

2. ตัวแก้ปัญหาที่อยู่ทั่วโลก

แล้วหนึ่งเธรดจากหลายล้านตัวจะหาเป้าหมายของมันได้อย่างไร? มันใช้ข้อตกลงที่แน่นอนที่เรียกว่าสูตรการระบุตำแหน่ง:

$$\text{threadID} = \text{blockIdx.x} \times \text{blockDim.x} + \text{threadIdx.x}$$

สูตรนี้ทำหน้าที่เป็นระบบพิกัด ช่วยเชื่อมโยงข้อมูลเชิงตรรกะของซอฟต์แวร์ (อาร์เรย์) กับลำดับชั้นทางกายภาพของฮาร์ดแวร์ (บล็อกและเธรด)

3. การกำหนดการดำเนินการ

การตั้งค่า <<<B, T>>> พารามิเตอร์ต่างๆ จะกำหนดรูปร่างของกริด ซึ่งช่วยให้มั่นใจได้ว่า ความสามารถในการขยายขนาดอย่างโปร่งใส: โค้ดของคุณจะทำงานด้วยตรรกะเหมือนกัน ไม่ว่าฮาร์ดแวร์จะมี 2 SM หรือ 80 SM

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

QUESTION 1

What is the primary role of the __global__ qualifier?

To define a function that runs on the CPU and is called by the GPU.

To mark a function as a kernel that is callable from the host and executes on the device.

To synchronize all threads across the entire GPU grid.

To allocate memory in the global memory space.

QUESTION 2

If blockIdx.x = 2, blockDim.x = 256, and threadIdx.x = 10, what is the global index?

266

512

522

778

QUESTION 3

What does 'Transparent Scalability' imply in CUDA?

The memory automatically scales with the size of the input array.

The same code can run on different GPUs with varying SM counts without modification.

Threads can see into the registers of other threads.

The kernel speed increases linearly with the clock speed of the CPU.

QUESTION 4

Why is the if (i < n) check necessary in a kernel?

To prevent the GPU from overheating.

To ensure threads do not access memory outside the valid array bounds.

To check if the kernel is running on the correct SM.

To synchronize memory access between threads.

QUESTION 5

Which variable represents the number of threads within a single block?

gridDim.x

blockIdx.x

blockDim.x

threadIdx.x

1. โครงร่าง __global__

2. ตัวแก้ปัญหาที่อยู่ทั่วโลก

3. การกำหนดการดำเนินการ

1. โครงร่าง global